Hot Topic Extraction and Public Opinion Classification of Tibetan Texts

نویسندگان

  • Guixian XU
  • Lirong QIU
چکیده

The increasing amount of Tibetan information has made Tibetan text processing popular and highly significant. In this study, Tibetan hot topic extraction and public opinion classification were investigated to accelerate the development of Tibetan information processing. First, Tibetan word segmentation in Tibetan hot topic extraction was presented. Second, feature selection based on term frequency and that based on document frequency was adopted to decrease feature dimensions. Third, a vector space model was used to conduct text representation. Finally, a statistical-based method was utilized to extract hot topics. In studying public opinion classification, a keyword table of public opinion needed to be established to conduct Tibetan public opinion classification. According to field, 18 classes were selected and used for public opinion classification. A keyword table of public opinion was constructed by domain experts. The approach to public opinion classification was introduced on the basis of the proposed similarity computation method. Depending on the proposed approaches, the application system was developed and used to carry out the experiments. Experiments show that the proposed method can extract topics effectively and classify public opinion rapidly. This research is helpful and meaningful for text classification, information retrieval, and construction of high-quality corpus. Subject Categories and Descriptors I.2.7 [Natural Language Processing]: Text Analysis; 1.5.4 [Pattern Recognition]: Applications – Information Extraction General Terms: Data Mining, Information Extraction, Knowledge Management

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research on Tibetan Text Orientation Identification

In recent years, Minority languages in China are widely used on the computer and network. But now there is no effective public opinion analysis system of the minorities overall attitude of the masses of the hot events or topics. In this study, we research on Tibetan topic orientation recognition. First, according to the Tibetan context and life characteristics, combined with a set of emotional ...

متن کامل

Study on Hot Topic Discovery from Chinese Texts

With the development of information technology, there has been an increased popularity in the use of electronic texts. Topic detection and tracking can identify hot information from isolated texts. Obtaining hot topics has become an important issue in recent years. The combination of statistics and natural language processing was utilized in the current study to discover hot topics from texts. ...

متن کامل

Automatic Recognition of Tibetan Buddhist Text by Computer

The purpose of this study is to develop a plausible method to code and compile Buddhist texts automatically from original Tibetan scripts into the Romanized form. We extract syllable from Tibetan texts and recognize automatically the Tibetan characters. The set of Tibetan characters consists of basic 30 consonants, 76 combination characters, and 4 vowels. Despite of the limited number of Tibeta...

متن کامل

Research on the Location and Extraction of Texts in Complex Background

Research on the location and extraction of the texts in complex background has important significance in current information age. It has enriched image processing theory and shows great business value in practical applications such as image and video searching in Internet environment as well as license plate recognition in modern traffic management. How to rapidly and accurately locate and extr...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016